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data obtained from LANDSAT 8 OLI satellite. The classification of drought 
risk areas was carried out using k-nn with the Spatial Autocorrelation 
Keywords: method. The spectral vegetation indices used in the study are NDVI, SAVI, 
VHI, TCI and VCI. The results show a positive correlation and trend 
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Remote sensing | and the characteristics of the High R.A. and Middle R.A. drought risk areas. 
Spatial autocorrelation The highest correlation coefficient is SAVI with a High R.A. amounted to 
Vegetation indices 0.967 and Middle R.A. amounted to 0.951. The results of the Kappa accuracy 


test comparison show that SVM and k-nn have the same accuracy of 88.30. 
The result of spatial prediction using the IDW method shows that spectral 
vegetation index data that initially as an outlier, using the k-nn method, 
the spectral vegetation index data can be identified as data in the aridity 
classification. The spatial connectivity test among sub-districts that 
experience drought was done using Moran’s I Analysis. 
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1. INTRODUCTION 

In Indonesia, disasters that often occur are classified into two types of disaster, namely, hydro 
meteorological disaster (78% or 11,650 cases) and geological disaster (22% or 3,800 cases) [1, 2]. 
The negative impacts that arise froma disaster are determined using a disaster risk index, which is calculated 
based on the factors of threat or danger, vulnerability, and disaster capacity. Central Java Province is one 
of the regions in Indonesia with a high disaster risk index (based on the level of vulnerability and disaster 
capacity) and has become a priority area for hydro meteorological disaster management. Aridity threat 
indices in Central Java have been classified in the high category covering an area of 3.2 million hectares. 
The assessment of the aridity disaster index in Central Java was carried out using meteorological parameters 
namely standardized precipitation index (SPI) [3]. The SPI method produces accurate aridity mformation, 
though it is considered as a conventional method because of its dependence on the availability of rainfall 
data; and is only effective for a long-term aridity assessment. In areas that have a limited number 
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of meteorological stations, the aridity information produces by this method is inaccurate and invalid [4, 5]. 
Based on the data obtained from the indonesian geophysics, climatology and meteorology agency (BMKG), 
Central Java Province only has six meteorology stations i.e. stations in Cilacap, Tegal, Semarang, Tanjung 
Mas, Ahmad Yani, and Banjarnegara. The limited number of meteorology stations in Central Java has caused 
inaccurate and invalid aridity indices information which does not represent the real situation that also affects 
the SPI analysis although the analysis is combined with spatial interpolation. 

Based on literature studies, it can be seen that the technology currently available makes it possible to 
detect and monitor the risk of aridity in a wider area with lower costs and high accuracy by using vegetation 
index extracted from remote sensing imagery [6-8]. Remote sensing has been used to monitoring and 
modeling the phenomena of hydrometeorology, soil movement, disaster and landuse. However, remote 
sensing data analysis requires the development of techniques, models, and methods to transform spectral 
information into easily interpretable forms [9-13]. Currently, more than 150 spectral vegetation indices have 
been discovered and published in various scientific literatures throughout the world. The spectral vegetation 
indices can be used as indicators of aridity disaster assessment because they provide information in the form 
of spectral characteristics in the process of aridity that runs very slowly and lasts fora long time, and which 
the onset of aridity cannot be ascertained. The spectral vegetation indices are represented as a single number 
of intensity, duration, and spatial extent of aridity risk [14]. 

In this paper, we propose to develop a framework to transform the spectral vegetation index into 
aridity risk index information. The proposed framework is a combination of the k-nearest neighbor (k-nn) 
and the Spatial Autocorrelation (SA) methods [15, 16]. We name the framework for assessing the aridity 
index as Satellite Imagery and Machine Learning Autocorrelation. The choice of k-nn method as 
the algorithm on the framework of aridity risk indices prediction is due to its ability to identify and classify 
the nearest neighbors of each spectral vegetation indices data extracted from the pixels of satellite imageries. 
The results of identification and classification of the spectral vegetation indices data group will be classified 
in different classes according to the distance of neighbors [17]. The SA method will calculate the strength 
of spatial connectivity of the class group data that have been classified by the k-nn. The results of spatial 
connectivity are visualized in the form of regional aridity pattern maps. Interpolation using the inverse 
distance weight (IDW) method is conducted in order to determine the aridity value not as a sample. 
The SA is the statistical method used to visualize the patterns of spatial distribution and exploration 
of vegetation, aridity levels and daily rainfall [17-19]. In the next sections of this paper, section 2 discusses 
discusses the theoretical background proposed in the framework, Section 3 discusses the research methods, 
Section 4 covers the results and discussions, Section 5 discusses the conclusions, and the last section 
is the references. 


2. BACKGROUND 

Spectral vegetation indices is a quantitative value of the measurement of vegetation canopy in 
receiving and reflecting the spectrum of light and interpreted as the spectral characteristic of vegetation, 
including the infrared spectrum as visible light (IR) and near infrared spectrum as invisible light (NIR). 
The reflectance of IR and NIR light from a vegetation canopy is determined by the structure of the canopy 
(number and orientation of the leaf) and the biochemical properties of the canopy (chlorophyll 
and carotenoid). VI is related to the characteristics of vegetation which includes: (1) types of vegetation 
(trees, shrubs, grasslands, etc.), (2) cropping patterns, (3) phases of plant growth, (4) leaf pigments, (5) water 
on plants, and (6) land. The aridity indicator vegetation index are: NDVI, SAVI, VCI, TCI and VHI was 
made using formula (1-5) [20-27]: 


band 5—band 4 


NDVI = PNIR Pred _ (1) 
PNIR+Pred band 5+band 4 

VCI = _NPVla-NDVImin_ 4100 (2) 

NDVImax+NDVI min 

VHI = aVCI + (1 —- a)TCI (3) 
(PNIR~PRed) 

SAV] = SOLE 4 
Cn N ) (4) 

TCI = _*STmax='8Ta_ 4.400 (5) 
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The classification and prediction stages were carried out by calculating each vegetation index value 
of NDVI, SAVI, VCI, TCI and VHI using formulas are: (1) k-nn, (2) RF and (3) SVM. The k-nn technique 
is an accurate method for pattern recognition and classification methods based on spatial objects that have 
been previously identified from remote sensing data particularly of pectral vegetation indices [15, 28]. 
In remote sensing analysis, the k-nn algorithm works in two stages, 1e. (1) learning, aims to create a class 
of data classification through neighboring relation among data, and (2) prediction to determine 
the classification of new data that has not yet have a neighboring relationship in a class that has been formed 
in the previous stage [15]. The main principle of k-nn is to classify data points that are distributed spatially 
based on the category of nearest neighbors. A set of training data of which M = 
{(x1,9),(%3,92), (3,93), Xn Yn) contains n data entities, which x,;eR% and yeY = {e}, ez, e3, ... Cn}. If 
there is new data with a value, then the size of neighborliness is N,(a) [29]. The distance function between 
data points is calculated to compare the similarity of one sample point with another sample point. The 
distance function is determined using the Euclidean distance and Manhattan distance equations 
as follows (6) [30]: 


(dX m Xn) = {ei mi — Xni) ) = (dX m Xn) = Da — Xn (6) 


The testing the performance classification of the area of aridity using the k-nn algorithm on spectral 
vegetation indices particularly of NDVI and SAVI data shows the value of accuracy of Cohen's Kappa 
Statistics is more stable with a confidence level between 95-100 percent [31]. 
visi 
sn 1 (7) 
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Determination of aridity risk was done by determining the distribution of aridity index throughout 
all regions using IDW interpolation method. By using the IDW method, regions that were not used as 
samples and whose aridity indices were unknown, the assessment was determined based on the mean 
of surrounding sample regions. Determination of IDW was made using formula (8) [32]. Where Z 
is the non-sampled location, Z; is the known value, B is the weight and ô is the smoothing factor. 


hi; = y (4x)? — (Ay)? (8) 


The value of h;; is the distance between the known and unknown sample points and is determined 
by the Euclidean equation as in equation (8). Where Ax and Ay are the distance between unknown points on 
jand sample one i. The purpose of Moran’s I calculations on vegetation index data is to determine the s patial 
connectivity of vegetation characteristics between sampling points. The Moran’s I equation is shown in (9): 


N ig Xj- Wij (xi-2)(xj- z) 


[I = 





9 
Xi=1 27 =1Wij ) Xi =i -2 (9) 


where N is the number of observations made, x is the vegetation index used as an observation indicator. 


While, the variable w; an is the weight between the observation areas to i andj at ¢ time[16]. 


3. RESEARCH METHOD 

The research area consists of 346 subdistricts in Central Java Province, Indonesia. The data used for 
this research are remote sensing data obtained from the monthly data of Landsat 8 OLI (operational land 
imager) satellite image of 2018—2019 periods and data from the United States geological survey (USGS) 
https://earthexplorer.usgs.gov/ with path/row 120/65. The framework of aridity risk assessment that 
is proposed in this research is presented in Figure 1. which the stages of research implementation 
are as follows: 

— Data prerocessing, LANDSAT 8 OLI images consisting of 11 bands in which each band has a different 
wavelength. The image wavelengths betwen 0.43 — 12.51. Vegetation index data is numeric types 
consisting of NDVI, SAVI, VHI, TCI and VCI data. The image band will be calculated to produce 
vegetation index as in the examples shown in (2). 

— Data classification, the vegetation indices of NDVI, SAVI, VHI, TCI and VCI were sought for their 
correlation values to see the relationship between indices, distribution patterns and trends over a certain 
period. Each data of vegetation indices where divided into two categories, namely testing data at 70% 
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and learning data at 30% of the total data. Vegetation indices data were classified into two groups, 
namely, high risk aridity and middle risk aridity using the k-nn method. 

— Performance test, conducting of data analysis using SVM and RF as a comparison of classification 
performance. Classification and prediction of vegetation indices in High R.A aridity and Middle R.A nsk 
areas groups were done using the R programming. 

— Predicting the spatial distribution of k-nn classification results using the Inverse Distance Weight method. 
Determining the spatial connectivity between regions to determine the spatial pattern risk aridity 
distributions in the research areas using Moran’s I. Compiling the local aridity indices using the data 
of k-nn spatial prediction. 


Data Extraction From LANDSAT 8 OLI Images 
0,435-0,451 0,552-0,512 0,533-0,590 0,636—0,673 0,851-0,879 11,50-12,51 
1,566-1,651 2,107-2,294 0,503-0,676 1,363-1,384 10,60-11,19 


Lem ae eT 


Pearson Correlation Analysis 
Of Vegetation Indices 
Testing and Learning Data 
Classification with 
K-Nearest Neighbours 





Data Pre-Processing Compute of vegetation indices 
1 


Spectral Euclidean Distance 


Spectral Manhattan 
Distance 


Spectral Minkowski 


Distance 


Comparison of Model 
Performance Evaluation 


Random Forest 
Inverse Distance ee 
Data Vector Map Weighting Local Aridity Index 


Figure 1. The proposed regional aridity risk assessment framework 
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The procedures of classification, prediction and data visualization algorithm were made as shown in 
the following pseudocode: 


Input: data= (value vi), (value pred vi),mi=raining data, mj=lable data, ml=sample data, 
location data=coordinates (data),location sample=coordinates (point sample),data_idw,power=2, 
estimated value, weighted value ,estimated value, distance 
Classification (mi , mj , ml) , 

for 1 to x do calculation of node distance at n(mi , ml) 


end for 
calculation of n distance > ni = classification n(mi , ml) 
calculation of n distance < ni back to mj in which mi,l eI 
end for 
morans’i 
mor={} 


Lor Gael Tom do value calculation={ Js Tor yal to: im do. “value: Calpulation= t} 
a=a + weighti.j * (Xxi- averagex) * (x; — averagex) 
b2=(b2 + xi — averagex), bi=(bi * weighti.; » b2) 
I = N * a/bı 
if I > -1/N-1 then positive aurocorrelation 
else 
if I = 0 then positive aurocorrelation 
else 
if I < -1/N-1 then negative aurocorrelation 
end if 
end for 
Inverse distance weight 
idw={} 
Por. 21. co: ap do-value: Calculation ={-} 
Value weight={} tor j=l. to m.do distance.= lecatiom sample = Location cbservation; 
value weightj=l1/distancepower value) calculationj=value weight ;* data; 
end for 
data idwi= (Yvalue_ calculation) /(Yvalue weight) 


4. RESULTS AND DISCUSSION 

Vegetation index data follows the seasonal pattern of wet months (occurring from November 
to March), dry months (occurring from May to September), and transitional months (occurring from 
April and October) [33]. In addition to the seasonal cycles of wet, dry and transitional months, 
there is a classification of aridity risk areas consisting high risk aridity (High R.A.) and middle risk aridity 
(Middle R.A). In the preprocessing step of the experiment, is to analyze correlations and trends among 
seasonal patterns and the classification of risk areas. The results of this analysis will help understanding 
the temporal connectivity of vegetation index, seasonal changes and the risk of local aridity. 
Pearson correlation analysis results show a positive correlation and trend between vegetation index 
influenced by seasonal dynamics and the characteristics of the High R.A aridity and Middle R.A risk areas as 
shown Figure 2. 

The analysis shows that all vegetation indices have a positive correlation and trend with High R.A 
and Middle R.A. The highest correlation coefficient is SAVI with a High R.A amounted to 0.967 and Middle 
R.A. amounted to 0.951. SAVI is a vegetation index that provides indicators of photosynthesis, biomasses, 
and biogeochemical processes at the local scale even though the area has seasonal dynamics. The lowest 
correlation coefficient is VCI with High R.A. amounted to 0.068 and Middle R.A. amounted to 0.382. VCI 
is a vegetation index that provides indicators of the diversity of local vegetation in relation to seasonal factors 
as shown in Figure 2. The next step is to perform the k-nn analysis and test the accuracy with confusion 
matrix method on k-1 up to k-10. The results of the accuracy testing from k-1 to k-10 are presented as 
the curve of Figure 3. 

The curve in Figure 3 shows that the lowest value is 80.55 at k-2 and the average is 86.99 
and the highest value is 92.03 at k-9. The next step is to compare the accuracy test between the confusion 
matrix method and the Kappa method from the data from the k-nearest neighbors (k-nn) classification, 
Support vector machine (SVM) and random forest (RF) as shown in Table 1. The results of confusion matrix 
accuracy test comparison show that SVM is 96.00, which is more accurate when compared to RF 
and k-nn. The results of the Kappa accuracy test comparison show that between SVM and k-nn have 
the same accuracy of 88.30. 
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Figure 3. Trend analysis of k-nn and accuracy testing using the confusion matrix method 


Table 1. Comparison of data classification using other ML methods 


Classification method Confusion matrix Kappa 
Support Vector Machine 96.00 88.30 
Random Forest 91.00 75.00 
k-nearest neighbours 92.03 88.30 


Clas sification data are analyzed using a boxplot diagram, with the aim of determining: (1) the lowest 
observation value, (2) the lowest quartile value (Q1), (3) the median value (Q2), (4) the highest quartile value 
(Q3), and (5) outher value. The boxplot analysis results show that the classification data have a wide range 
that exceeds the range of testing data from the minimum limit (Q1) or the maximum limit (Q3) as shown in 
Figure 4. The width of the QI and Q?2 data ranges, and the shift in the center of the boxplot are caused by 
the inclusion of data that is previously included as outlier into the classification of High R.A or Middle R.A. 
The visualized outlier data poimts are at the High R.A and Middle R.A boxplot classifications both in 
predictive data and LANSAT imagery data. An example is the outlier at the knn prediction data which 
is almost as the same as the data from the LANDSAT imagery. Shifts in the values of Q1, Q2, and Q3 
are clearly identified in TCI, VCI and VHI. The vegetation index of TCI and VCI are indicators 
of the occurrence of aridity caused by seasonal cycles and their effect on canopy temperature, leaf surface 
stomata movement and humidity level of the canopy. The dynamics of TCI and VCI are caused by seasonal 
fluctuation factors that will have an impact on the health of vegetation which is shown to increase the VHI 
value. Figure 4 shows that most of the classification data are in the Middle R.A classification, 
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which is indicated by the length of the inner quartile range (IQR). The previous outlier shifting data that has 
become data in the High R.A group or in the middle R.A group shows that there are changes in the curve 
peaks and line trends on the line graph. The red line represents High R.A and the blue line represents Middle 
R.A. The line curve between the NDVI and SAVI vegetation indices indicates a similar trend which 
the position of the second peak point of the vegetation indices is between 0.2-0.4, which indicates that in 
some parts of the study areas have vegetation and have high photosynthetic activities as shown in Figure 4. 
The line curve shows that knn.NDVI and knn.SAVI prediction lines show patterns that almost identical to 
the NDVI and SAVI line patterns from the LANDSAT imagery data. On the line curve as shown in Figure 4, 
it can be seen that the Middle R.A line has a higher and wider curve compared to the High R.A line. 
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Figure 4. Data are in the Middle R.A classification, which is shown by the length of the inner quartile 
range (IQR). The Middle R.A curve line has a higher and wider curve compared to High R.A curve line 


The next step is to make a spatial prediction on the value of the classification results using inverse 
distance weight (IDW) to visualize of spatial pattern Vegetation Indices and value of k-nn vegetation indices 
at areas that are not used as observation points. Spatial prediction results are in the form of an aridity risk 
map for sub-district level or lower as shown in Figure 5. In Figure 5 several sub-districts are shown as 
brighter blue nodes and are shown in darker color. This phenomenon shows that the vegetation index data 
that were previously in the outlier classification, through the k-nn prediction process, the data are then 
included in the Middle R.A (gradation of blue colors) and High R.A classifications (gradation red colors). 

Natural phenomena indicate that various spatial elements interact with each other especially with 
neighboring spatial objects [30]. The results of the Moran's I test shown in Table 2. Table 2 shows that 
all vegetation indices are positive autocorrelation, which is interpreted to have Middle R.A. and High R.A. in 
the entire observation areas. 
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Figure 5. The figure shows the comparison of predicted results of High R.A and Middle R.A. k-NN 
algorithm works by inserting outlier data into the classification of vegetation index data, 
and data classification 


Table 2. Analysis of Moran’s I on testing data and prediction vegetation indices data 


Vegetation index Moran’s I Interpretation 
NDVI 0.996 Positive Autocorrelation 
Knn-NDVI 0.996 Positive Autocorrelation 
SAVI 0.996 Positive Autocorrelation 
Knn-SAVI 0.996 Positive Autocorrelation 
VHI 0.997 Positive Autocorrelation 
Knn-VHI 0.995 Positive Autocorrelation 
TCI 0.995 Positive Autocorrelation 
Knn-TCI 0.995 Positive Autocorrelation 
VCI 0.995 Positive Autocorrelation 
Knn-VCI 0.995 Positive Autocorrelation 


5. CONCLUSION 

The analysis results of pearson correlation show that there is a positive correlation and trend among 
vegetation indices influenced by seasonal dynamics and the characteristics of High R.A. and Middle R.A. 
drought risk areas. All vegetation indices have a positive correlation and trend with High R.A. and Middle 
R.A. The correlation coefficient of spectral vegetation index shows that SAVI has the highest correlation 
with High R.A. amounted to 0.967 and with Middle R.A. amounted to 0.951. SAVI provides indicators 
of photosynthetic activities, biomasses, and biogeochemical processes ona local scale even though the region 
has seasonal dynamics. The lowest correlation coefficient is VCI with High R.A. amounted to 0.068 and with 
Middle R.A. amounted to 0.382. VCI provides indicators of the diversity of local vegetation in relation to 
seasonal factors. Testing of the classification accuracy between the confusion matrix method and the kappa 
method was made from the data of k-nearest neighbors (k-nn) classification, which support vector machine 
(SVM) and random forest (RF) show that the SVM is 96.00, which is more accurate when compared to 
RF or k-nn. The results of the Kappa accuracy test comparison show that SVM and k-nn have the same 
accuracy of 88.30. The spatial prediction of drought risk maps at the sub-district level shows that the spectral 
vegetation index data that was previously in the outlier classification, through the process of k-nn prediction 
into data on Middle R.A. and High R.A. Moran’s I test results show that all vegetation indices are positive 
autocorrelation, which is interpreted to be the Middle R.A. and High R.A. in the entire observation area. 
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